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Abstract — The term data mining is often used to apply to the 
two separate processes of knowledge discovery and 
prediction. Knowledge discovery provides explicit 
information that has a readable form and can be understood 
by a user. Forecasting, or predictive modeling provides 
predictions of future events and may be transparent and 
readable in some approaches ( e.g . rule based systems) and 
opaque in others such as neural networks. Moreover, some 
data mining systems such as neural networks are inherently 
geared towards prediction and pattern recognition, rather 
than knowledge discovery. Utility item set mining is addition 
to the frequent pattern mining. The goal of high utility item 
set mining is to find all item sets that give utility greater or 
equal to the user specified threshold. The deficiency of this 
approach is that it does not consider the statistical aspect of 
item sets. Utility -based measures should incorporate user- 
defined utility as well as raw statistical aspects of data. 
Consequently, it is meaningful to define a specialized form of 
high utility item sets, utility -frequent item sets which are a 
subset of high utility item sets as well as frequent item sets. 
In this paper we proposed an efficient approach to mine high 
utility items form transactional records. 

Keywords — Utility, Candidates, Transactions, Thresholds, 
Item set. 

I. INTRODUCTION 

Mining high utility item sets is upgrades the standard 
frequent item set mining framework as it employs 
subjectively defined utility instead of statistics -based support 
measure. User-defined utility is based on information not 
available in the transaction dataset. It often reflects user 
preference and can be represented by an external utility table 
or utility function. Utility table (or function) defines utilities 
of all items in a given database (we can also treat them as 
weights). Besides subjective external utility we also need 
transaction dependent internal utilities (e.g. quantities of 
items in transactions). Utility function we use to compute 
utility of an item set takes into account both internal and 
external utility of all items in a item set. The most usual form 


that is also used in products of internal and external utilities 
of present items. 



Fig. 1.1: Utility means many things 

II. ARCHITECTURE 



Fig. 2: Architecture 
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User gives a minimum threshold value. We calculate total 
utility value entire database and compare with given 
threshold value. The item set which satisfy the given 
condition known as high utility. 

III. BACKGROUND 

Consider a simple example of transactional database 
Table. 1: Transactional data set 


TID & ITEM 

11 

12 

13 

14 

15 

T1 

0 

0 

18 

0 

1 

T2 

0 

6 

0 

1 

1 

T3 

2 

0 

1 

0 

1 

T4 

1 

0 

0 

1 

1 

T5 

0 

0 

4 

0 

2 

T6 

1 

1 

0 

0 

0 

T7 

0 

10 

0 

1 

1 

T8 

3 

0 

25 

3 

1 

T9 

1 

1 

0 

0 

0 

T10 

0 

6 

2 

0 

2 


The utility table, the right column displays the profit of each 
item per unit in dollars 


Table. 2: Profit table 


ITEM 

PROFIT($)(Per 

Unit) 

11 

3 

12 

10 

13 

1 

14 

6 

15 

5 


External Utility: - The external utility of an item i p is a 
numerical value y p defined by the user. It is transaction 
independent and reflects importance (usually profit) of the 
item. External utilities are stored in a utility table. For 
example, external utility of item 12 in Table 2 is 10. 

Internal Utility: -The internal utility of an item i p is a 
numerical value x p which is transaction dependent. In most 
cases it is defined as the quantity of an item in transaction. 
For example, internal utility of item 15 in transaction T5 is 2 
in table 1 . 

The utility of item: - The utility of item i p in transaction T is 
the quantitative measure computed with utility function from 
above definition u(i p , T) = f(x p , y p ), i p E T . For example: 
utility of item 15 in transaction T5 is 2*5 = 10. 
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The utility of item set S in transaction T: - The utility of 
item set S in transaction T is defined as 

u(S’. T) — V «(»,,. / ). S c. T 

For example utility of itemset {12, 15} in transaction T2 is 
u({ 12,15} , T2) = u({12 } , T2) + u({15} , T2) = 6 * 10 + 1 * 5 
= 65. 

The utility of item i p in item set S: - The utility of item ip in 
item set S is defined as 

i i(ip.S)— ST n(ip t T). 

T • I'll SCT 

For example, utility of item E in item set {12, 15} is u(I5, 
{12,15}) = u(I5, T2) + u(I5, T7) + u(I5, T10) = 20. 

The utility of item set S in database DB: - The utility of 
item set S in database DB is defined as 

u(S) = V u(S.T) » V V 
re iw s'T jv mi. sc r s 

For example, utility of item set {11,15} in database from 
Table 1 is 

u({Il ,15 }) = u({Il ,15 } , T3) + u({Il ,15 } , T4) + u({Il,I5E} , 
T8) = 33. 

The utility of transaction T:- The utility of transaction T is 
defined as 

u(T) = 51 u(*p* T )• 

ip* r 

For example: utility of transaction T10 is 

u(T10) = u({12} , T10)+u({13} , T10)+ u({13} , T10) = 72. 

The utility of database DB :- The utility of database DB is 
defined as 

u\Dli) = V «( T). 

t nn 

For example, utility of database DB from table 1 and table 2 
is 

u(DB) = u(Tl) + . . . +u(T10) = 23 + . . . + 72 = 400. 

The utility share of item set S in database:- The utility 
share of item set S in database DB is 

For example, utility share of item set {11,14,15} in database 
from Table 1 is U({I1,I4,I5 }) = 46/400 = 0.115 = 11.5%. 

IV. CATEGORIES OF UTILITY MINING 
ALGORITHMS 

Utility mining algorithm is mainly classified into three 
categories. First categories include top down approach, 
second categories include bottom up approach and third 
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approach based on Frequent Pattern Growth tree. Top down 
approach based algorithm are those algorithm which is based 
on up word words closer properties mean generate candidates 
set and then use pruning strategy to remove useless 
candidates. In seconds categories algorithms us utility pattern 
growth tree based concepts for generating use full pattern . 
These algorithms use tree like structure which contain main 
root node and other sub tree. 

Utility Mining 
Algorithms 


V. LITERATURE REVIEW 

In (1994) Agarwal proposed the mining of association rules 
for finding the relationships between data items in large 
databases. 

In 2003 Chan observes that the candidate set pruning strategy 
exploring the ant monotone property used in classical 
algorithm does not hold for utility mining. 

In 2004 Yao defines the problem of utility mining formally. 
The work defines the terms transaction utility and external 
utility of an itemset. The mathematical model of utility 
mining was then defined based on the two properties of utility 
bound and support bound. 

In 2006, 2007 Yao defines the utility mining problem as one 
of the cases of constraint mining. This work shows that the 
downward closure property used in the standard Classical 
algorithm and the convertible constraint property are not 
directly applicable to the utility mining problem. 

In 2008 Li proposed two efficient one pass algorithms 
MHUI-BIT and MHUI-TID for mining high utility item sets 
from data streams within a transaction sensitive sliding 
window. Liu et al in proposes a Two-phase algorithm for 
finding high utility item sets. 

In 2009 Shankar presents a novel algorithm Fast Utility 
Mining (FUM) which finds all high utility item sets within 
the given utility constraint threshold. 

In 2010 Vincent S. Tseng, et. al. Proposed a data structure, 
named UP-Tree, and then describe a new algorithm, called 
UP-Growth, The framework of the UP-Growth. 


In 2012 Mengchi Liu et al. proposed “Mining High Utility 
Item sets without Candidate Generation”]. 

In 2013 Arumugam P et al. proposed “Advance Mining Of 
High Utility Item sets In Transactional Data”. 

In 2014 More Rani N. and Anbhule Reshma V “Mining High 
Utility Item sets From Transaction Database”[13,14]. 

VI. PROBLEM STATEMENT 

In utility mining process very large number of candidates is 
generated. These useless candidates make execution process 
slow because we have to prune these items and consider only 
that item which satisfies the threshold value. So improving 
pruning strategies is a difficult task. 

In previous algorithms the function used for calculating 
utility is also in efficient because some algorithm are based 
on expected utility mining model and some are based on 
transaction weighted utility model. Improving accuracy is 
also a challenge. 

VII. PROPOSED ALGORITHM 

We proposed an efficient method which combined reducing 
the cost of database scans by transaction merging and pruning 
search space by using utility and local utility. 

Calculate Transaction- weighted utility value for one item by 
using twu(X) formula. 

Generate high utility one item set by comparing Transaction- 
weighted utility value of each one item set with give utility 
threshold value 8 . 

Update the transaction weighted utilization table by 
subtracting the utility value of deleted one item set. 

While (ICkl > 0 and k =K) (more candidate) 

Generate candidate set for next level and Transaction- 
weighted utility value for item set using twu(X) formula. 
Generate high utility one item set by comparing Transaction- 
weighted utility value of each one item set with give utility 
threshold value 8. 

Generate all high utility item set . 

VIII. EXPERIMENTAL ANALYSIS 

We evaluate the performance of proposed algorithm and 
compare it with iFUM and TP (Two Phase) algorithms. The 
experiments were performed on i3 processor (2. 5 GHz Intel 
Processor with 4M cache memory), 2GB main memory and 
400 GB secondary memory , and running on Windows XP. 
The algorithms are implemented in using C# Dot Net 
Framework language version 4.0.1. Both synthetic datasets 
are used to evaluate the performance of the algorithms. 



Top down 
Approach 
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£ iFUM TP Proposed 

z Method 


Steps 


Fig. 2: Comparison graph 

IX. CONCLUSION 

Mining Expected Utility Two Phase and several other 
algorithms have mine high utility item set very efficiently. 
But there is need to enhance this algorithm so that it can be 
applied to large sized dataset. The complexity factor for 
frequent pattern mining algorithm includes several factors 
like Execution time and I/O cost. The proposed method 
reduce candidates generations at different stages 
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